TODO

pipe and junction

Version

Version: 0.1 Date: 3.4.2016

Introduction

FiPi stands for Financial Processing instructions. FiPi is a domain specific language (DSL). It lets you define data sources and pre-processing steps for financial timeseries. For example, you could define a timeseries AAPL, and define it to be the daily close price of the Apple stock, sourced from Bloomberg, adjusted for dividends.

FiPi is lightweight and technology-neutral.

FiPi Benefits

The main benefits of using FiPi are:

  • formalisation: FiPi forces you to formalise the data definitions used in financial software systems
  • configurability: FiPi allows you to define new instruments, data-sources, etc. without having to change the source code
  • modularisation: FiPi lets you define re-usable modules of pre-processing steps performed on aquired raw data

Artefacts and Components

FiPi is a standard, and as such is no more than a definition. In practice, you will need a few components to make FiPi work for you. Specifically, a typical setup could consist of:

  • one or more FiPi Contexts, each defined in a YAML file. The contexts define the timeseries you are interested in for your software application
  • one or more software libraries, providing the functionality to download, validate, and transform your timeseries. These libraries can be your own, or third-party open source or commercial libraries.
  • a FiPi interpreter, mapping the FiPi instructions to actual code in your libraries

Then, from within the code of your software application, you will make a call against the FiPi interpreter’s API, and in return get a timeseries object for a given ID.

This document is mostly about the FiPi Context: What is it? How do you define a context file? Etc.

Concepts

Timeseries

FiPi is all about financial timeseries. In FiPi, a timeseries is a well-defined entity with the following two properties:

  • it has an ID
  • it has a semantic meaning

So, for a given ID, say AAPL, FiPi defines exactly what AAPL refers to. For instance, AAPL is then not the “Apple Inc. stock price” (whatever that refers to). Instead, it is, say, the “daily close price of the Apple Inc. stock, sourced from Bloomberg, adjusted for dividends”.

These two properties (ID and semantic) assure that when you retrieve AAPL within your context, you know exactly what you are getting. It does not mean two subsequent calls will return the same thing, as new data or past corrections may result in differences.

Context

A FiPi Context is a well-defined set of timeseries.

Each FiPi Context is defined in a separate FiPi Definition File.

Within a context, each ID must be unique.

However, it is not uncommon that a single financial instrument is represented more than once in a context. For example, you might have a timeseries AAPL for the unadjusted timeseries, and AAPL_a for the adjusted timeseries.

Conversely, it is also very common to have multiple contexts for a single software application. For example, you could have a context for each of the following modules:

  • data acquisition: this context defines the timeseries that need to be downloaded from data providers, together with validation and cleaning instructions
  • trading: this context defines, for example, actual futures contracts
  • research: this context could define rolled futures series, together with the instructions to stitch them together from individual futures contracts

Instructions

Often, financial data needs to be pre-processed before it can be fed into an algorithm, used for a visualization, or saved into a database.

For example, data needs to be validated, and cleaning needs to be done. In FiPi, these processing steps are called instructions. You can think of each instruction as a function, where the input is one or more timeseries, and the output is the transformed timeseries.

Examples

Some examples of instructions are:

  • data acquisition:
    • download prices from third party data providers
    • download prices from the internet
    • download prices from a propietary in-house data-source
    • cache prices in memory with a predefined timeout
  • data validation:
    • test for missing data points
    • check age of last available data point
  • data cleaning:
    • remove outliers
    • backfill missing datapoints with the last available value
    • enhance a series by filling missing data with data from an alternative data source
  • data transformation:
    • derive returns from prices
    • index prices to start at 100
    • normalize a risk index to take values between -1 and +1
    • convert frequency of data, e.g. from hourly bars to daily
    • regularize the time index, e.g. by setting it always to end-of-month for economic time series
  • data aggregation:
    • adjust price series for dividends
    • fill history with a proxy series
    • combine multiple price series into a custom index
    • stitch multiple futures contracts into a generically rolled series, using your custom algorithm

Instruction Semantics

FiPi instructions are free of a semantic meaning. In other words, FiPi does not have a predefined set of instructions from which you can chose. On the contrary, the FiPi interpreters define how they map an instruction to an actual function. (See more about FiPi interpreters below).

This can be seen as both good and bad:

  • It is bad for portability, as your definition/implementation of, say, Backfill might not correspond to someone else’s
  • It is good for flexibility, as you are free to implement any instruction you can come up with
  • It makes FiPi very lightweight, yet powerful, as you can use any pre-existing functionality and map it to an instruction

Why did we chose this aproach? As much as we value portability, we attributed less weight to it for FiPi. The reason is that flexibility is king. There are literally thousands of price sources, cleaning routines, etc. Organizations have invested large amounts to create libraries that perform these routines, and piggy-backing on them is of highest priority for FiPi.

Also, if portability is needed, there are sound strategies to ensure it. For example, you can write a library with implementations of your instructions, and distribute it together with your FiPi files.

Conditions

With conditions, you can define what you expect as an input to an instruction (pre-condition), and what downstream instructions can expect as an output (post-condition).

This is very handy, e.g. for plausibility checking.

Processing Tree

Multiple instructions are combined into a processing tree. There is exactly one processing tree per timeseries.

Technically, a processing tree consists of a number of nodes, where each node is an instruction.

The reason for having a processing tree (as opposed to, say, a processing pipe) is that, often, we want to aggregate multiple timeseries into a single timeseries. And here, a tree structure comes in very handy.

The processing order of a processing tree is from leafs to root.

For example, each leaf of a processing tree could define a data source, and nodes further down the tree define validation, transformation and aggregation steps. The root of the tree then represents the result of the processing tree, which typically is a single timeseries.

More specifically, a stylized example processing tree could be structured like this:

FiPi Definition File

A FiPi Context is defined in a YAML file.

FiPi Interpreters

A FiPi interpreter is a piece of software that reads a FiPi Definition File and provides an API to call a processing tree in order to load and preprocess financial data.

For example, in pseudo code, this might look like this:

fipi = LoadContext("C:/temp/myfipi.yaml")
aapl = LoadData("AAPLadj", fipi)
Plot(appl)

Reference Implementation

The reference implementation of a FiPi Interpreter is programmed in the R programming language, and can be obtained from Github. We hope to add FiPi interpreters in other languages in the near future. Please do get in touch with us if you are interested in collaborating with us. As FiPi is very lightweight, implementation of an interpreter is very straightforward and simple.